Two-Stage Consistency Algorithm

Algorithm Flowchart

The following diagram illustrates the two-stage sequential consistency algorithm used when use_twostage = TRUE in forestsearch().

flowchart TD
    A[/"Candidate Subgroup m"/] --> B["<b>STAGE 1: SCREENING</b><br/>Run n.splits.screen splits<br/>(default: 30)"]
    
    B --> C{{"Consistency <<br/>screen.threshold?"}}
    
    C -->|Yes| D["❌ <b>FAIL</b><br/>Candidate eliminated<br/>(clearly non-viable)"]
    
    C -->|No| E["<b>STAGE 2: SEQUENTIAL EVALUATION</b><br/>Initialize with Stage 1 results"]
    
    E --> F["Run batch of splits<br/>(batch.size = 20)"]
    
    F --> G["Compute Wilson CI<br/>for consistency"]
    
    G --> H{{"CI_lower ≥<br/>threshold?"}}
    
    H -->|Yes| I["✅ <b>PASS</b><br/>Early stop<br/>(95% confident above)"]
    
    H -->|No| J{{"CI_upper <<br/>threshold?"}}
    
    J -->|Yes| K["❌ <b>FAIL</b><br/>Early stop<br/>(95% confident below)"]
    
    J -->|No| L{{"Reached<br/>n.splits.max?"}}
    
    L -->|No| F
    
    L -->|Yes| M{{"Final consistency ≥<br/>threshold?"}}
    
    M -->|Yes| N["✅ <b>PASS</b>"]
    M -->|No| O["❌ <b>FAIL</b>"]
    
    style A fill:#e1f5fe
    style B fill:#fff3e0
    style E fill:#fff3e0
    style D fill:#ffcdd2
    style K fill:#ffcdd2
    style O fill:#ffcdd2
    style I fill:#c8e6c9
    style N fill:#c8e6c9
Figure 1: Two-Stage Sequential Consistency Algorithm

Parameter Summary

flowchart LR
    subgraph Stage1["<b>Stage 1: Screening</b>"]
        P1["n.splits.screen<br/><i>default: 30</i>"]
        P2["screen.threshold<br/><i>auto: ~pcons - 2.5 SE</i>"]
        P3["min.valid.screen<br/><i>default: 10</i>"]
    end
    
    subgraph Stage2["<b>Stage 2: Sequential</b>"]
        P4["batch.size<br/><i>default: 20</i>"]
        P5["conf.level<br/><i>default: 0.95</i>"]
        P6["n.splits.max<br/><i>= fs.splits</i>"]
    end
    
    Stage1 --> Stage2
    
    style Stage1 fill:#fff3e0
    style Stage2 fill:#e3f2fd
Figure 2: Two-Stage Algorithm Parameters

Early Stopping Logic

The Wilson score confidence interval provides the basis for early stopping decisions:

flowchart TD
    subgraph CI["Wilson Score CI at conf.level = 0.95"]
        A["Current: n_success / n_total"]
        A --> B["Compute 95% CI<br/>[lower, upper]"]
    end
    
    B --> C{{"lower ≥ threshold"}}
    C -->|Yes| D["<b>PASS</b><br/>95% confident<br/>consistency ≥ threshold"]
    
    C -->|No| E{{"upper < threshold"}}
    E -->|Yes| F["<b>FAIL</b><br/>95% confident<br/>consistency < threshold"]
    
    E -->|No| G["<b>CONTINUE</b><br/>Need more data"]
    
    style D fill:#c8e6c9
    style F fill:#ffcdd2
    style G fill:#fff9c4
Figure 3: Early Stopping Decision Logic

Comparison: Fixed vs Two-Stage

flowchart LR
    subgraph Fixed["<b>Fixed-Sample</b><br/>(use_twostage = FALSE)"]
        F1["Run exactly<br/>fs.splits splits"] --> F2["Compute<br/>consistency"] --> F3["Pass/Fail<br/>decision"]
    end
    
    subgraph TwoStage["<b>Two-Stage</b><br/>(use_twostage = TRUE)"]
        T1["Stage 1<br/>Screen"] --> T2["Stage 2<br/>Sequential"] --> T3["Early stop<br/>or max splits"]
    end
    
    Fixed -.->|"Predictable runtime<br/>Exact reproducibility"| Use1["Regulatory<br/>submissions"]
    TwoStage -.->|"3-10x faster<br/>Adaptive"| Use2["Exploratory<br/>analysis"]
    
    style Fixed fill:#e8eaf6
    style TwoStage fill:#e8f5e9
Figure 4: Algorithm Comparison

Code Example

# Two-stage with custom parameters
result <- forestsearch(
  df.analysis = trial_data,
  hr.threshold = 1.25,
  pconsistency.threshold = 0.90,
  fs.splits = 500,
  use_twostage = TRUE,
  twostage_args = list(
    n.splits.screen = 40,
    batch.size = 25,
    conf.level = 0.95
  ),
  details = TRUE
)

# Check which algorithm was used
result$grp.consistency$algorithm
#> [1] "twostage"

When Two-Stage Provides Maximum Benefit

Scenario Expected Speedup
Many candidates clearly fail at Stage 1 5-10x
True consistency well above threshold 3-5x
True consistency well below threshold 3-5x
Large fs.splits (>200) Higher benefit
Most candidates near threshold Minimal